⬛ NullByte Mini

Open-source cybersecurity AI — trained to think like a defender. 1.7B parameters, fine-tuned with LoRA on cybersecurity Q&A data.

What is NullByte Mini?

NullByte Mini is a small language model fine-tuned specifically for cybersecurity education. It explains vulnerabilities, guides CTF learners, maps certification paths, and covers ethical hacking topics in plain English.

Built on SmolLM2-1.7B-Instruct and fine-tuned using LoRA on free Google Colab T4. No expensive compute. Fully open source.

Live Demo

Try it on Hugging Face Spaces →

Capabilities

CVE explanations — what the vulnerability is, severity, how it works
CTF hints — guides your thinking without spoiling the flag
Cert roadmaps — CEH, OSCP, CompTIA Security+, TryHackMe, HackTheBox
Linux hacking — commands, tools, enumeration techniques
Web security — OWASP Top 10, SQLi, XSS, SSRF, path traversal
Privilege escalation — Linux and Windows PrivEsc techniques
OSINT — reconnaissance and open source intelligence
Cryptography — TLS, AES, RSA, hash attacks
Network security — TCP/IP, Wireshark, firewall analysis

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "YOUR_USERNAME/nullbyte-mini"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

def ask(question):
    prompt = (
        "<|im_start|>system\n"
        "You are NullByte Mini, a cybersecurity AI mentor. English only.\n"
        "<|im_end|>\n"
        f"<|im_start|>user\n{question}<|im_end|>\n"
        "<|im_start|>assistant\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        repetition_penalty=1.1,
    )
    return tokenizer.decode(
        outputs[0][inputs.input_ids.shape[1]:],
        skip_special_tokens=True
    )

print(ask("What is a buffer overflow?"))
print(ask("Give me a Linux privilege escalation checklist"))
print(ask("Explain CVE-2026-4747 simply"))

Repository Structure

nullbyte-mini/
├── README.md                   ← You are here
├── LICENSE                     ← Apache 2.0
├── requirements.txt            ← Dependencies
├── train_nullbyte.py      ← LoRA fine-tuning pipeline (run in Colab)
├── build_nullbyte_dataset.py            ← Dataset builder — generates 500+ Q&A pairs
├── inference.py                ← Simple inference script
├── nullbyte_app.py             ← Gradio demo for Hugging Face Spaces
├── data/
│   └── nullbyte_dataset.json   ← Full training dataset (transparent)
└── docs/
    ├── TRAINING.md             ← Training details
    ├── EVAL.md                 ← Evaluation results
    └── ETHICS.md               ← Ethics and safety

Training

Property	Value
Base model	SmolLM2-1.7B-Instruct
Method	LoRA (PEFT), rank 16
Trainable params	~1.5% of total
Dataset	nullbyte-cybersec-v1 (500+ examples)
Epochs	3
Hardware	Google Colab T4 (free tier)
Training time	~90 minutes

Train your own version:

git clone https://github.com/YOUR_USERNAME/nullbyte-mini
cd nullbyte-mini
pip install -r requirements.txt
python build_nullbyte_dataset.py       # Generate dataset
python train_nullbyte.py # Fine-tune (run in Colab T4)

Evaluation

Manually evaluated on 50 held-out questions across 10 categories:

Category	Rating
Vulnerability explanation	⭐⭐⭐⭐
CTF guidance	⭐⭐⭐⭐
Certification roadmaps	⭐⭐⭐⭐⭐
Linux commands	⭐⭐⭐⭐
Web security	⭐⭐⭐⭐
Factual accuracy	⭐⭐⭐⭐

Ethics

NullByte Mini is for education and defense only.

✅ Learning security concepts, CTF practice, cert prep, understanding attacks to defend against them
❌ Creating malware, attacking systems you don't own, any illegal activity

Contributing

Pull requests welcome — especially more training examples.

git clone https://github.com/YOUR_USERNAME/nullbyte-mini
git checkout -b your-feature
git commit -m "add: your change"
git push origin your-feature
# open a Pull Request

License

Apache 2.0 — free to use, modify, and build on.

Citation

@misc{nullbyte-mini-2026,
  title = {NullByte Mini: Open-Source Cybersecurity Language Model},
  year  = {2026},
  url   = {https://github.com/YOUR_USERNAME/nullbyte-mini}
}

NullByte Mini · Apache 2.0 · nullbyte-mini · Open source

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
ETHICS.md		ETHICS.md
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
TRAINING.md		TRAINING.md
nullbyte-ui-.jsx		nullbyte-ui-.jsx
nullbyte_app.py		nullbyte_app.py
requirements-1.txt		requirements-1.txt
train_nullbyte.py		train_nullbyte.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⬛ NullByte Mini

What is NullByte Mini?

Live Demo

Capabilities

Quick Start

Repository Structure

Training

Evaluation

Ethics

Contributing

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⬛ NullByte Mini

What is NullByte Mini?

Live Demo

Capabilities

Quick Start

Repository Structure

Training

Evaluation

Ethics

Contributing

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages